From features to expression: High-density oligonucleotide array analysis revisited

نویسندگان

  • Felix Naef
  • Daniel A. Lim
  • Nila Patil
  • Marcelo O. Magnasco
چکیده

One of the most popular tools for large scale gene expression studies are highdensity oligonucleotide (GeneChip R ) arrays. These currently have 16-20 small probes (“features”) for evaluating the transcript abundance of each gene. In addition, each probe is accompanied by a mismatched probe (MM) designed as a control for non-specificity. An algorithm is presented to compute comparative expression levels from the intensities of the individual features, based on a statistical study of their distribution. Interestingly, MM probes need not be included in the analysis. We show that our algorithm improves significantly upon the current standard and leads to a substantially larger number of genes brought above the noise floor for further analysis. Bioinformatics is based on the existence of vast quantities of information of unknown significance whose internal relationships are analyzed using statistical methods. The individual data in these data sets are usually highly inhomogeneous in quality, with the number of elements increasing rapidly for lower quality levels. A recurrent problem in the statistical analysis of such data sets is that while no sophisticated methods are needed to ascertain the meaning of the few high quality elements, the bulk of the data often lies near the noise floor, where fairly fancy statistical tools may become necessary. In such circumstances, seemingly innocuous improvements to data treatment may yield large improvements to the analysis simply because of the way the data quality is distributed. Among the many experimental techniques generating large datasets from biological experiments today, oligonucleotide hybridization arrays have rapidly become a popular tool for large scale gene expression screens[1, 2]. Currently, DNA hybridization array techniques aim at obtaining several thousand low quality measurements of transcript abundance in a single parallel experiment. From this “bulk” data, the goal is to identify groups of genes participating in a given pathway and hopefully unravel some features of their transcriptional regulation, to be confirmed by more sensitive and precise methods. There are currently two main trends in microarray technology, cDNA bicolor glass slides [3, 4] and the high-density oligonucleotide arrays (HDONAs) manufactured by Affymetrix [5, 6]. In the first case, PCR-derived cDNAs from libraries are spotted onto a glass slide as hybridization probes. In the second, hybridization probes consist of chemically synthesized -mer oligonucleotides on a grided array. Under the best conditions, one would expect a linear relationship between the measured fluorescence and the concentration of original mRNA. However, the constant of proportionality is currently strongly dependent on the hybridization sequences. As a consequence, large scale hybridization experiments do not give quantitative information on a gene vs. gene fashion for a single preparation, i.e., it is not possible to infer the ratio of mRNA concentration for actin to tubulin within a single sample. The meaningful information lies in the ratios of intensities for the same hybridization sequence taken from different samples. Usually one thinks of one sample (e.g. ’normal’ tissue or unsynchronized cells) as a baseline to which all other conditions are compared. In what follows, we concentrate exclusively on HDONAs. On these, probes (or features) are grouped into probe sets for a given gene, a probe set consisting of (depending on the gene and the chip series) probe pairs. Each pair is designed to

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bibliographie Juin 2008

An improved physico-chemical model of hybridization on high-density oligonucleotide microarrays .. 5 Comparison of Affymetrix GeneChip expression measures .................................................................... 5 Evaluation of methods for oligonucleotide array data via quantitative real-time PCR............................ 5 Gene expression levels assessed by oligonucleotide microa...

متن کامل

Feature extraction and normalization algorithms for high-density oligonucleotide gene expression array data.

Algorithms for performing feature extraction and normalization on high-density oligonucleotide gene expression arrays, have not been fully explored, and the impact these algorithms have on the downstream analysis is not well understood. Advances in such low-level analysis methods are essential to increase the sensitivity and specificity of detecting whether genes are present and/or differential...

متن کامل

Accurate and reproducible gene expression profiles from laser capture microdissection, transcript amplification, and high density oligonucleotide microarray analysis.

Gene expression profiling using high density oligonucleotide arrays is a powerful method to generate an unbiased survey of a cell's transcriptional landscape. Increasingly complex biological questions require that this approach be applicable to the small numbers of cells that are obtained from sources such as laser capture microdissection (LCM) of solid tissues. In this report, we demonstrate t...

متن کامل

HugeIndex: a database with visualization tools for high-density oligonucleotide array data from normal human tissues

High-density oligonucleotide arrays are a powerful tool for uncovering changes in global gene expression in various disease states. To this end, it is essential to first characterize the variations of gene expression in normal physiological processes. We established the Human Gene Expression (HuGE) Index database (www.HugeIndex.org) to serve as a public repository for gene expression data on no...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001